Detecting and Correcting Errors in an English Tectogrammatical Annotation
نویسنده
چکیده
We present our first experiments with detecting and correcting errors in a manual annotation of English texts, taken from the Penn Treebank, at the dependency-based tectogrammatical layer, as it is defined in the Prague Dependency Treebank. The main idea is that errors in the annotation usually result in an inconsistency, i. e. the state when a phenomenon is annotated in different ways at several places in a corpus. We describe our algorithm for detecting inconsistencies (it got positive feedback from annotators) and we present some statistics on the manually corrected data and results of a tectogrammatical analyzer which uses these data for its operation. The corrections have improved the data just slightly so far, but we outline some ways to more significant improvement.
منابع مشابه
Ways to Improve the Quality of English-Czech Machine Translation
This thesis describes English-Czech Machine Translation as it is implemented in TectoMT system. The transfer uses deep-syntactic dependency (tectogrammatical) trees and exploits the annotation scheme of Prague Dependency Treebank. The primary goal of the thesis is to improve the translation quality using both rule-base and statistical methods. First, we present a manual annotation of translatio...
متن کاملTowards English-to-Czech MT via Tectogrammatical Layer
We present an overview of an English-to-Czech machine translation system. The system relies on transfer at the tectogrammatical (deep syntactic) layer of the language description. We report on the progress of linguistic annotation of English tectogrammatical layer and also on first end-to-end evaluation of our syntax-based MT system.
متن کاملInconsistency Detection in Semantic Annotation
Inconsistencies are part of any manually annotated corpus. Automatically finding these inconsistencies and correcting them (even manually) can increase the quality of the data. Past research has focused mainly on detecting inconsistency in syntactic annotation. This work explores new approaches to detecting inconsistency in semantic annotation. Two ranking methods are presented in this paper: a...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملTraining Paradigms for Correcting Errors in Grammar and Usage
This paper proposes a novel approach to the problem of training classifiers to detect and correct grammar and usage errors in text by selectively introducing mistakes into the training data. When training a classifier, we would like the distribution of examples seen in training to be as similar as possible to the one seen in testing. In error correction problems, such as correcting mistakes mad...
متن کامل